Tools and Techniques for Decision Tree Learning
نویسنده
چکیده
Decision tree learning is an important field of machine learning. In this study we examine both formal and practical aspects of decision tree learning. We aim at answering to two important needs: The need for better motivated decision tree learners and an environment facilitating experimentation with inductive learning algorithms. As results we obtain new practical tools and useful techniques for decision tree learning. First, we derive the practical decision tree learner Rank based on the Findmin protocol of Ehrenfeucht and Haussler. The motivation for the changes introduced to the method comes from empirical experience, but we prove the correctness of the modifications in the probably approximately correct learning framework. The algorithm is enhanced by extending it to operate in the multiclass situations, making it capable of working within the incremental setting, and providing noise tolerance into it. Together these modifications entail practicability through a formal development process, which constitutes an important technique for decision tree learner design. The other tool that comes out of this work is TELA, a general testbed for all inductive learners using attribute representation of data, not only for decision tree learners. This system guides and assists its user in taking new algoritms to his disposal, operating them in an easy fashion, designing and executing useful tests with the algorithms, and in interpreting the outcome of the tests. We present the design rationale, current composition, and future development directions of TELA. Moreover, we reflect on the experiences that have been gathered in the initial usage of the system. The tools that come about are evaluated and validated in empirical tests over many real-world application domains. Several successful inductive algorithms are contrasted with the Rank algorithm in experiments that are carried out using TELA. These experiments let us evaluate the success of the new decision tree learner with respect to its established equivalents and validate the utility of the developed testbed. The tests prove successful in both respects: Rank attains the same overall level of prediction accuracy as C4.5, which is generally considered to be one of the best empirical decision tree learners, and TELA eases the execution of the experiments substantially.
منابع مشابه
Fault Detection of Anti-friction Bearing using Ensemble Machine Learning Methods
Anti-Friction Bearing (AFB) is a very important machine component and its unscheduled failure leads to cause of malfunction in wide range of rotating machinery which results in unexpected downtime and economic loss. In this paper, ensemble machine learning techniques are demonstrated for the detection of different AFB faults. Initially, statistical features were extracted from temporal vibratio...
متن کاملMMDT: Multi-Objective Memetic Rule Learning from Decision Tree
In this article, a Multi-Objective Memetic Algorithm (MA) for rule learning is proposed. Prediction accuracy and interpretation are two measures that conflict with each other. In this approach, we consider accuracy and interpretation of rules sets. Additionally, individual classifiers face other problems such as huge sizes, high dimensionality and imbalance classes’ distribution data sets. This...
متن کاملApplication of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملAutomatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique
The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کامل